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1 Introduction 

It is generally accepted that many time series of practical interest exhibit 
strong dependence, i.e., long memory. For such series, the sample autocor- 
relations decay slowly and log-log periodogram plots indicate a straight-line 
relationship. This necessitates a class of models for describing such behavior. 
A popular class of such models is the autoregressive fractionally integrated 
moving average (ARFIMA) (see [Ade74], [GJ80]), [Hos81], which is a linear 
process. However, there is also a need for nonlinear long memory models. For 
example, series of returns on financial assets typically tend to show zero cor- 
relation, whereas their squares or absolute values exhibit long memory. See, 
e.g., [DGE93]. Furthermore, the search for a realistic mechanism for generat- 
ing long memory has led to the development of other nonlinear long memory 
models. (Shot noise, special cases of which are Parke, Taqqu-Levy, etc). In this 
chapter, we will present several nonlinear long memory models, and discuss 
the properties of the models, as well as associated parametric and semipara- 
metric estimators. 

Long memory has no universally accepted definition; nevertheless, the 
most commonly accepted definition of long memory for a weakly station- 
ary process X = {X t , t 6 Z} is the regular variation of the autocovariance 
function: there exist H € (1/2, 1) and a slowly varying function L such that 

cov(X ,X t )=L(t)\t\ 2H - 2 . (1) 

Under this condition, it holds that: 



lim n- 2H i(n) _1 var VlJ = 1/(2H(2H- 1)). (2) 




The condition (2) does not imply (1). Nevertheless, we will take (2) as an 
alternate definition of long memory. In both cases, the index H will be referred 
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to as the Hurst index of the process X. This definition can be expressed in 
terms of the parameter d = H — 1/2, which we will refer to as the memory 
parameter. The most famous long memory processes are fractional Gaussian 
noise and the ARFIMA{p 1 d, q) process, whose memory parameter is d and 
Hurst index is H = 1/2 + d. Sec for instance [Taq03] for a definition of these 
processes. 

The second-order properties of a stationary process are not sufficient to 
characterize it, unless it is a Gaussian process. Processes which are linear with 
respect to an i.i.d. sequence (strict sense linear processes) are also relatively 
well characterized by their second-order structure. In particular, weak con- 
vergence of the partial sum process of a Gaussian or strict sense linear long 
memory processes {X t } with Hurst index H can be easily derived. Define 
Sn(t) = Ei=i(^fc - E[*k]) in discrete time or S n (t) = J^(X S - E[X s ])ds 
in continuous time. Then vax(S n (l))~ 1 ^ 2 S n (t) converges in distribution to a 
constant times the fractional Brownian motion with Hurst index H, that is 
the Gaussian process Bh with covariance function 

cov(B H (s), B H (t)) = ±{\s\ 2H -\t- s\ 2H + t 2H } . 

In this paper, we will introduce nonlinear long memory processes, whose 
second order structure is similar to that of Gaussian or linear processes, but 
which may differ greatly from these processes in many other aspects. In Sec- 
tion 2, we will present these models and their second-order properties, and 
the weak convergence of their partial sum process. These models include con- 
ditionally heteroscedastic processes (Section 2.1) and models related to point 
processes (Section 2.2). In Section 3, we will consider the problem of estimat- 
ing the Hurst index or memory parameter of these processes. 



2 Models 

2.1 Conditionally heteroscedastic models 

These models are defined by 

X t = o t v t , (3) 

where {v t } is an independent identically distributed series with finite variance 
and of is the so-called volatility. We now give examples. 

LMSV and LMSD 

The Long Memory Stochastic Volatility (LMSV) and Long Memory Stochas- 
tic Duration (LMSD) models are defined by Equation (3), where of = exp(/i t ) 
and {ht} is an unobservable Gaussian long memory process with memory pa- 
rameter d e (0, 1/2), independent of {v t }. The multiplicative innovation series 
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{v t } is assumed to have zero mean in the LMSV model, and positive support 
with unit mean in the LMSD model. The LMSV model was first introduced by 
[BCdL98] and [Har98] to describe returns on financial assets, while the LMSD 
model was proposed by [DHH05] to describe durations between transactions 
on stocks. 

Using the moment generating function of a Gaussian distribution, it can 
be shown (see [Har98]) for the LMSV/LMSD model that for any real s such 
that E[|w t | s ] < oo, 

Ps{j) ~ C s f d ~ x j^oo, 

where p s {j) denotes the autocorrelation of {|x t | s } at lag j, with the convention 
that s = corresponds to the logarithmic transformation. As shown in [SV02] , 
the same result holds under more general conditions without the requirement 
that {h t } be Gaussian. 

In the LMSV model, assuming that {ht} and {vt} are functions of a mul- 
tivariate Gaussian process, [RobOl] obtained similar results on the autocor- 
relations of {|X t | s } with s > even if {ht} is not independent of {v t }. Sim- 
ilar results were obtained in [SV02], allowing for dependence between {ht} 
and {v t }. 

The LMSV process is an uncorrelated sequence, but powers of LMSV or 
LMSD may exhibit long memory. [SV02] proved the convergence of the cen- 
tered and renormalizcd partial sums of any absolute power of these processes 
to fractional Brownian motion with Hurst index 1/2 in the case where they 
have short memory. 

FIEGARCH 

The weakly stationary FIEGARCH model was proposed by [BM96]. The FIE- 
GARCH model, which is observation-driven, is a long-memory extension of the 
EG ARCH (exponential GARCH) model of [Ncl91]. The FIEGARCH model 
for returns {X t } takes the form 2.1 innovation series {v t } are i.i.d. with zero 
mean and a symmetric distribution, and 

oo 

log of = uj + aj9(vt-j) (4) 
j=i 

with g(x) = 9x + -f(\x\ — E|ut|), uj > 0, 9 e M, 7 e M, and real constants 
a,j such that the process {log of} has long memory with memory parameter 
d e (0, 1/2). If 9 is nonzero, the model allows for a so-called leverage effect, 
whereby the sign of the current return may have some bearing on the future 
volatility. In the original formulation of [BM96], the {aj} are the AR(oo) 
coefficients of an ARFIMA(p, d, q) process. 

As was the case for the LMSV model, here we can once again express the 
log squared returns as in (18) with fi = E[log v\ ]+u>, u t = log if — E[logWj ], and 
h t = log of — uj. Here, however, the processes {h t } and {u t } are not mutually 
independent. The results of [SV02] also apply here, and in particular, the 
processes {|A t | u }, {log(A t 2 )} and {a t } have the same memory parameter d. 
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ARCH(oo) and FIGARCH 

In ARCH(oo) models, the innovation series {v t } is assumed to have zero mean 
and unit variance, and the conditional variance is taken to be a weighted sum 
of present and past squared returns: 

oo 

of = u> + J2 a 3 X t-j , (5) 

fe=l 

where u>,aj,j = 1,2,... are nonncgative constants. The general framework 
leading to (3) and (5) was introduced by [Rob91]. [KL03] have shown that 
YlJLi a j < 1 is a necessary condition for existence of a strictly stationary 
solution to equations (3), (5), while [GKLOO] showed that YlJLi a j < 1 is 
a sufficient condition for the existence of a strictly stationary solution. If 
Y^jLi a j = 1) tne existence of a strictly stationary solution has ben proved by 
[KL03] only in the case where the coefficients aj decay exponentially fast. In 
any case, if a stationary solution exists, its variance, if finite, must be equal 
to uj(1 — Y^k=i a fc) _1 i so tnat ^ cannot be finite if Y^k=i a fc = 1 an d w > 0. If 
u> = 0, then the process which is identically equal to zero is a solution, but it 
is not known whether a nontrivial solution exists. 

In spite of a huge literature on the subject, the existence of a strictly or 
weakly stationary solution to (3), (5) such that {of}, {|A t |"} or {log(Af)} 
has long memory is still an open question. If o,j < 1, and the coefficients 

aj decay sufficiently slowly, [GKLOO] found that it is possible in such a model 
to get hyperbolic decay in the autocorrelations {p r } of the squares, though 
the rates of decay they were able to obtain were proportional to r~ e with 
9 > 1. Such autocorrelations are summable, unlike the autocorrelations of a 
long-memory process with positive memory parameter. For instance, if the 
weights {aj} are proportional to those given by the AR(oo) representation of 
an ARFIMA(p, d, q) model, then = — 1 — d. If Y^jLi ctj = 1, then the process 
has infinite variance so long memory as defined here is irrelevant. 

Let us mention for historical interest the FIGARCH (fractionally inte- 
grated GARCH) model which appeared first in [BBM96]. In the FIGARCH 
model, the weights {aj} are given by the AR(oo) representation of an 
ARFIMA(p, d, q) model, with d E (0,1/2), which implies that Y,f=i a ] = ^ 
hence the very existence of FIGARCH series is an open question, and in any 
case, if it exists, it cannot be weakly stationary. The lack of weak stationarity 
of the FIGARCH model was pointed out by [BBM96] . Once again, at the time 
of writing this paper, we are not aware of any rigorous result on this process 
or on any ARCH(oo) process with long memory. 

LARCH 

Since the ARCH structure (appearently) fails to produce long memory, an 
alternative definition of heteroskedasticity has been considered in which long 
memory can be proved rigorously. [GS02] considered models which satisfy the 
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equation X t = ( t A t + B u where {( t } is a sequence of i.i.d. centered ran- 
dom variables with unit variance and A t and B t are linear in {X t } instead of 
quadratic as in the ARCH specification. This model nests the LARCH model 
introduced by [Rob91], obtained for B t = 0. The advantage of this model is 
that it can exhibit long memory in the conditional mean B t and/or in the con- 
ditional variance A t , possibly with different memory parameters. See [GS02, 
Corollary 4.4]. The process {Xt} also exhibits long memory with a memory 
parameter depending on the memory parameters of the mean and the con- 
ditional variance [GS02, Theorem 5.4]. If the conditional mean exhibits long 
memory, then the partial sum process converges to the fractional Brownian 
motion, and it converges to the standard Brownian motion otherwise. See 
[GS02, Theorem 6.2]. The squares {X?} may also exhibit long memory, and 
their partial sum process converge either to the fractional Brownian motion 
or to a non Gaussian self-similar process. This family of processes is thus very 
flexible. An extension to the multivariate case is given in [DTW05]. 

We conclude this section by the following remark. Even though these pro- 
cesses are very different from Gaussian or linear processes, they share with 
weakly dependent processes the Gaussian limit and the fact that weak limits 
and L 2 limits have consistent normalisations, in the sense that, if £„ denotes 
one of the usual statistics computed on a time series, there exists a sequence 
v n such that i> n £„ converges weakly to a non degenerate distribution and 
w^E[^] converges to a positive limit (which is the variance of the asymptotic 
distribution) . In the next subsection, we introduce models for which this is no 
longer true. 

2.2 Shot noise processes 

General forms of the shot-noise process have been considered for a long time; 
see for instance [Tak54], [Dal71]. Long memory shot noise processes have been 
introduced more recently; an early reference seems to be [GMS93]. We present 
some examples of processes related to shot noise which may exhibit long mem- 
ory. For simplicity and brevity, we consider only stationary processes. 

Let {tj, j € Z} be the points of a stationary point process on the line, 
numbered for instance in such a way that t-i < < to, and for t > 0, let 
N(t) — J2j>o ^-{tj<t} be the number of points between time zero and t. Define 
then 

X t = ^2^ 1 {t j <t<t j+Vj }, t>0. (6) 

In this model, the shocks {e^} are an i.i.d. sequence; they are generated at 
birth times {tj} and have durations {rjj}. The observation at time t is the 
sum of all surviving present and past shocks. In model (6), we can take time 
to be continuous, teRor discrete, t e Z. This will be made precise later for 
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each model considered. We now describe several well known special cases of 
model (6). 

1. Renewal-reward process; [TL86], [LiuOO]. 

The durations are exactly the interarrival times of the renewal process: 
770 = to, Vj = tj+i ~ tj, and the shocks are independent of their birth 
times. Then there is exactly one surviving shock at time t: 

X t = ejv(t)- (7) 

2. ON-OFF model; [TWS97]. 

This process consists of alternating ON and OFF periods with indepen- 
dent durations. Let {rjk}>i and {£fc}/c>i be two independent i.i.d. se- 
quences of positive random variables with finite mean. Let to be indepen- 
dent of these sequences and define tj = t + ^2i =1 (r]k + Cfe)- The shocks 
ej are deterministic and equal to 1. Their duration is rjj. The rjjS are the 
ON periods and the Qa are the OFF periods. The first interval to can also 
be split into two successive ON and OFF periods 770 and Co- The process 
X can be expressed as 

X t = 1 {tx W <t<t N(t) +- nN(t} }- (8) 

3. Error duration process; [Par99]. 

This process was introduced to model some macroeconomic data. The 
birth times are deterministic, namely tj — j, the durations {rjj} are i.i.d. 
with finite mean and 

X t = ^2 6 3 1 {t<j+V ] }- ( 9 ) 

3<t 

4. Infinite Source Poisson model. 

If the tj are the points of a homogeneous Poisson process, the dura- 
tions {rjj} are i.i.d. with finite mean and ej = 1, we obtain the infinite 
source Poisson model or M/G/oo input model considered among others 
in [MRRS02] . 

[MRR02] have considered a variant of this process where the shocks (re- 
ferred to as transmission rates in this context) are random, and possibly 
contemporaneously dependent with durations. 

In the first two models, the durations satisfy rjj < tj + i — tj, hence are not 
independent of the point process of arrivals (which is here a renewal process). 
Nevertheless rjj is independent of the past points {tk, k < j}. The process 
can be defined for all t > without considering negative birth times and 
shocks. In the last two models, the shocks and durations are independent of 
the renewal process, and any past shock may contribute to the value of the 
process at time t. 
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Stationarity and second order properties 

• The renewal-reward process (7) is strictly stationary since the renewal pro- 
cess is stationary and the shocks are i.i.d. It is moroever weakly stationary if 
the shocks have finite variance. Then E[X t ] — E[e x ] and 

cov(X , X t ) = E[e 2 ] P(77o > t) = \E[e\] E[( m - t)+] , (10) 

where r] is the delay distribution and A = E[(ii — to)] -1 is intensity of the 
stationary renewal process. Note that this relation would be true for a general 
stationary point process. Cf. for instance [TL86] or [HHS04] . 

• The stationary version of the ON-OFF was studied in [HRS98]. The first 
On and OFF period rjo and Co can be defined in such a way that the process 
X is stationary. Let F OD and F Q g be the distribution functions of the ON and 
OFF periods rji and Ci- [HRS98, Theorem 4.3] show that if 1 — F on is regularly 
varying with index a € (1, 2) and 1 — F a s(t) = o(F on (t)) as t — > oo, then 

cov(X , X t ) ~ cP{ Vo >t)= cXE[( m - t)+] , (11) 

• Consider now the case when the durations are independent of the birth 
times. To be precise, assume that {(rij7 e j)} 1S an i-i-d- sequence of random 
vectors, independent of the stationary point process of points {tj}. Then the 
process {X t } is strictly stationary as long as E[^i] < oo, and has finite variance 
if E[ef?7i] < oo. Then E[X t ] = AE[eu?i] and 

cov(X , X t ) = AE[ej (rn-t)+] 

+ {cov( ei N(-rn, 0], e 2 N(t - 772,*]) - AE[ ei e 2 ( m A (n 2 - t)+}} , 

where A is the intensity of the stationary point process, i.e. A -1 = E[i ]- The 
last term has no known general expression for a general point process, but it 
vanishes in two particular cases: 

if N is a homogeneous Poisson point process; 
if £i is centered and independent of r/i. 

In the latter case (10) holds, and in the former case, we obtain a formula 
which generalizes (10): 

cov(X ,X t ) = \E[e 2 1 (r )1 -t) + } . (12) 

We now see that second order long memory can be obtained if (10) holds and 
the durations have regularly varying tails with index a € (1, 2) or, 

ne\M m >t}]=t(t)t- a ■ (13) 

Thus, if (13) and either (11) or (12) hold, then X has long memory with Hurst 
index H = (3 — a)/ 2 since 
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cov(X ,X t )~ —jl^t 1 -* . (14) 

Examples of interest in teletraffic modeling where t\ and r\\ are not indepen- 
dent but (13) holds are provided in [MRR02] and [FRS05]. 

We conjecture that (14) holds in a more general framework, at least if the 
interarrival times of the point process have finite variance. 

Weak convergence of partial sums 

This class of long memory process exhibits a very distinguishing feature. In- 
stead of converging weakly to a process with finite variance, dependent sta- 
tionary increments such as the fractional Brownian motion, the partial sums 
of some of these processes have been shown to converge to an a-stable Levy 
process, that is, an a-stable process with independent and stationary incre- 
ment. Here again there is no general result, but such a convergence is easy to 
prove under restrictive assumptions. Define 

s T (t)= [ Tt {x s -nx s }}d S . 

Jo 

Then it is known in the particular cases described above that the finite di- 
mensional distributions of the process l(T)T-^ a S T (for some slowly varying 
function t) converge weakly to those of an a-stable process. This was proved in 
[TL86] for the renewal reward process, in [MRRS02] for the ON-OFF and infi- 
nite source Poisson processes when the shocks are constant. A particular case 
of dependent shocks and durations is considered in [MRR02] . [HHS04] proved 
the result in discrete time for the error duration process; the adaptation to the 
continuous time framework is straightforward. It is also probable that such a 
convergence holds when the underlying point process is more general. 

Thus, these processes are examples of second order long memory process 
with Hurst index H e (1/2, 1) such that T~ H S T {t) converges in probability to 
zero. This behaviour is very surprising and might be problematic in statistical 
applications, as illustrated in Section 3. 

It must also be noted that convergence does not hold in the space T> of 
right-continuous, left-limited functions endowed with the J\ topology, since a 
sequence of processes with continuous path which converge in distribution in 
this sense must converge to a process with continuous paths. It was proved 
in [RvdBOO, Theorem 4.1] that this convergence holds in the Mi topology for 
the infinite source Poisson process. For a definition and application of the Mi 
topology in queuing theory, see [Whi02]. 

Slow growth and fast growth 

Another striking feature of these processes is the slow growth versus fast 
growth phenomenon, first noticed by [TL86] for the renewal-rewrd process 
and more rigorously investigated by [MRRS02] for the ON-OFF and infinite 
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source Poisson process 3 . Consider M independent copies X( % \ 1, < i < M of 
these processes and denote 

If M depends on T, then, according to the rate growth of M with respect to 
T, a stable or Gaussian limit can be obtained. More precisely, the slow growth 
and fast growth conditions are, up to slowly varying functions MT x ~ a — > 
and MT 1_Q — > oo, respectively. In other terms, the slow and fast growth 
conditions are characterized by var(^4M,T(l)) "C b(MT) and var(AM.r(l)) 3> 
b(MT), respectively, where b is the inverse of the quantile function of the 
durations. 

Under the slow growth condition, the finite dimensional distributions of 
L(MT)(MT)~ 1 /" ! Am,t converge to those of a Levy a-stable process, where L 
is a slowly varying function. Under the fast growth condition, the sequence of 
processes T^ H e~ 1 ^ 2 (T)M- 1 / 2 A M ,T converges, in the space X>(R+) endowed 
with the Ji topology, to the fractional Brownian motion with Hurst index 
H = (3 — a)/2. It is thus seen that under the fast growth condition, the 
behaviour of a Gaussian long memory process with Hurst index H is recovered. 

Non stationary versions 

If the sum defining the process X in (6) is limited to non negative indices 
j, then the sum has always a finite number of terms and there is no restric- 
tion on the distribution of the interarrival times tj+i — tj and the durations 
r/j. These models can then be nonstationary in two ways: either because of 
initialisation, in which case a suitable choice of the initial distribution can 
make the process stationary; or because these processes arc non stable and 
have no stationary distribution. The latter case arises when the interarrival 
times and/or the durations have infinite mean. These models were studied 
by [RROO] and [MR04] in the case where the point process of arrivals is a 
renewal process, contrary to the stationry case, where heavy tailed durations 
imply non Gaussian limits, the limiting process of the partial sums has non 
stationary increments and can be Gaussian in some cases. 

2.3 Long Memory in Counts 

The time series of counts of the number of transactions in a given fixed interval 
of time is of interest in financial econometrics. Empirical work suggests that 
such series may possess long memory. See [DHH05]. Since the counts are 

3 Actually, in the case of the Infinite Source Poisson process, [MRRS02] consider a 
single process but with an increasing rate A depending on T, rather than super- 
position of independent copies. The results obtained are nevertheless of the same 
nature. 
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induced by the durations between transactions, it is of interest to study the 
properties of durations, how these properties generate long memory in counts, 
and whether there is a connection between potential long memory in durations 
and long memory in counts. 

The event times determine a counting process N(t) = Number of events 
in (0,t\. Given any fixed clock-time spacing At > 0, we can form the time 
series {AN t >} = {N{t'At) - N[(t' - I) At]} for t' = 1, 2, . . ., which counts the 
number of events in the corresponding clock-time intervals of width At. We 
will refer to the {AN t <} as the counts. Let > denote the waiting time 
(duration) between the k — l'st and the fc'th transaction. 

We give some preliminary definitions taken from [DVJ03] . 

Definition 1. A point process N(t) = N(0,t] is stationary if for every 
r = 1,2,... and all bounded Borel sets Ai, . . . , A r , the joint distribution of 
{N(A 1 + t),..., N(A r + t)} does not depend on t e [0, oo). 

A second order stationary point process is long-range count dependent 
(LRcD) if 

.. var(jVft)) 

hm = oo . 

t^oo t 

A second order stationary point process N(t) which is LRcD has Hurst 
index H S (1/2, 1) given by 

var(jVft)) 

H = supj/i : hmsup ^ = °°) • 

Thus if the counts {AN t >}P' = _ 00 on intervals of any fixed width At > 
are LRD with memory parameter d then the counting process N(t) must 
be LRcD with Hurst index H = d + 1/2. Conversely, if N(t) is an LRcD 
process with Hurst index H, then {ANf} cannot have exponentially decaying 
autocorrelations, and under the additional assumption of a power law decay of 
these autocorrelations, {AN t '} is LRD with memory parameter d = H — 1/2. 

There exists a probability measure P° under which the doubly infinite 
sequence of durations {i~k}%L_ ryo are a stationary time series, i.e., the joint 
distribution of any subcollection of the {r^} depends only on the lags be- 
tween the entries. On the other hand, the point process N on the real line is 
stationary under the measure P. A fundamental fact about point processes is 
that in general (a notable exception is the Poisson process) there is no single 
measure under which both the point process TV and the durations {tu} are 
stationary, i.e., in general P and P° are not the same. Nevertheless, there is a 
one-to-one correspondence between the class of measures P° that determine 
a stationary duration sequence and the class of measures P that determine 
a stationary point process. The measure P° corresponding to P is called the 
Palm distribution. The counts are stationary under P, while the durations are 
stationary under P°. 

We now present an important theoretical result obtained by [Dal99] . 
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Theorem 1. A stationary renewal point process is LRcD and has Hurst in- 
dex H = (l/2)(3 — a) under P if the interarrival time has tail index 1 < a < 2 
under P . 

Theorem 1 establishes a connection between the tail index of a duration 
process and the persistence of the counting process. According to the theorem, 
the counting process will be LRcD if the duration process is iid with infinite 
variance. Here, the memory parameter of the counts is completely determined 
by the tail index of the durations. 

This prompts the question as to whether long memory in the counts can 
be generated solely by dependence in finite-variance durations. An answer in 
the affirmative was given by [DRVOO] , who provide an example outside of the 
framework of the popular econometric models. We now present a theorem on 
the long-memory properties of counts generated by durations following the 
LMSD model. The theorem is a special case of a result proved in [DHSW05] , 
who give sufficient conditions on durations to imply long memory in counts. 

Theorem 2. If the durations {tu} are generated by the LMSD process with 
memory parameter d, then the induced counting process N(t) has Hurst index 
H = 1/2 + d, i.e. satisfies var(N(t)) ~ Ct 2d+1 under P as t — > oo where 



3 Estimation of the Hurst index or memory parameter 

A weakly stationary process with autocovariance function satisfying (1) has 
a spectral density / defined by 



This series converges uniformly on the compact subsets of [— 7r,7r] \ {0} and 
in 7r, 7r], dx). Under some strengthening of condition (1), the behaviour 

of the function / at zero is related to the rate of decay of 7. For instance, if 
we assume in addition that L is ultimately monotone, we obtain the following 
Tauberian result [Taq03, Proposition 4.1], with d = H — 1/2. 



Thus, a natural idea is to estimate the spectral density in order to estimate the 
memory paramter d. The statistical tools are the discrete Fourier transform 
(DFT) and the periodogram, defined for a sample U\, . . . , U n , as 



C > 0. 




(15) 



lim L(x)- 1 x 2d f(x) = ir- 1 r(2d)cos(ird). 



(16) 



n 
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where uj — 2jir/n, 1 < j < n/2 are the so-called Fourier frequencies. (Note 
that for clarity the index n is omitted from the notation). In the classical 
weakly stationary short memory case (when the autocovariance function is 
absolutely summable), it is well known that the periodogram is an asymptot- 
ically unbiased estimator of the spectral density fu defined in (15). This is no 
longer true for second order long memory processes. [HB93] showed (in the 
case where the function L is continuous at zero but the extension is straight- 
forward) that for any fixed positive integer j, there exists a positive constant 
c(k, H) such that 

lim E[I u (LO j )/f u (w j )]=c(j,H). 

n — >oo 

The previous results are true for any second order long memory process. 
Nevertheless, spectral method of estimation of the Hurst parameter, based 
on the heuristic (but incorrect) assumption that the renormaliscd DFTs 
fjj 1/>2 (ujj) J^ j are i.i.d. standard complex Gaussian have been proposed and 
theoretically justifed in some cases. The most well known is the GPH esti- 
mator of the Hurst index, introduced by [GPH83] and proved consistent and 
asymptotically Gaussian for Gaussian long memory processes by [Rob95b] and 
for a restricted class of linear processes by [VelOO]. Another estimator, often 
referred to as the local Whittle or GSE estimator was introduced by [Kiin87] 
and again proved consistent asymptotically Gaussian by [Rob95a] for linear 
long memory processes. 

These estimators are built on the m first log-periodogram ordinates, where 
m is an intermediate sequence, i.e. 1/m + m/n ^ as th oo. The choice 
of m is irrelevant to consistency of the estimator but has an influence on the 
bias. The rate of convergence of these estimators, when known, is typically 
slower than s/n. Trimming of the lowest frequencies, which means taking the 
I first frequencies out is sometimes used, but there is no theoretical need for 
this practice, at least in the Gaussian case. See [HDB98]. For nonlinear series, 
we are not sure yet if trimming may be needed in general. 

In the following subsections, we review what is known, both theoretically 
and empirically, about these and related methods for the different types of 
nonlinear processes described previsoulsy. 

We start by describing the behaviour of the renormalized DFTs at low 
frequencies, that is, when the index j of the frequency Uj remains fixed as 
n — > oo. 

3.1 Low-Frequency DFTs of Counts from Infinite- Variance 
Durations 

To the best of our knowledge there is no model in the literature for long 
memory processes of counts. Hence the question of parametric estimation 
has not arisen so far in this context. However, one may still be interested in 
semiparametric estimation of long memory in counts. We present the following 
result on the behavior of the Discrete Fourier Transforms (DFTs) of processes 



Long Memory in Nonlinear Processes 



13 



of counts induced by infinite-variance durations that will be of relevance to 
us in understanding the behavior of the GPH estimator. Let n denote the 
number of observations on the counts, uj = 2nj/n, and define 



1 ™ 



Assume that the distribution of the durations satisfies 



P{r k >x)^ l{x)x- a x^oo (17) 

l(kx) 
t(x) 



where £(x) is a slowly varying function with lim^^oo e ^ x ) = 1 Vfc > and 



£(x) is ultimately monotone at oo. 

Theorem 3. Let {t^} be i.i.d. random variables which satisfy (17) with a G 
(1,2) and mean [i T . Then for each fixed j , l(n)~ x n 1 l 2 ~ x l a converges in 
distribution to a complex ot-stable distribution. Moreover, for each fixed j, 

ufJnf ^0, where d=l-a/2. 

The theorem implies that when j is fixed, the normalized periodogram of 
the counts, (jj^lAN^j) converges in probability to zero. The degeneracy of 
the limiting distribution of the normalized DFTs of the counts suggests that 
the inclusion of the very low frequencies may induce negative finite-sample 
bias in semiparametric estimators. In addition, the fact that the suitably nor- 
malized DFT has an asymptotic stable distribution could further degrade 
the finite-sample behavior of semiparametric estimators, more so perhaps for 
the Whittlc-likclihood-based estimators than for the GPH estimator since the 
latter uses the logarithmic transformation. 

By contrast, for linear long-memory processes, the normalized periodogram 
has a nondegenerate positive limiting distribution. See, for example, [TH94]. 

3.2 Low-Frequency DFTs of Counts from LMSD Durations 

We now study the behavior of the low-frequency DFTs of counts generated 
from finite-variance LMSD durations. 

Theorem 4. Let the durations {r^} follow an LMSD model with memory 
parameter d. Then for each fixed j , UyJ^ , converges in distribution to a 
zero-mean Gaussian random variable. 

This result is identical to what would be obtained if the counts were a 
linear long-memory process, and stands in stark contrast to Theorem 3. The 
discrepancy between these two theorems suggests that the low frequencies will 
contribute far more bias to semiparametric estimates of d based on counts 
if the counts are generated by infinite-variance durations than if they were 
generated from LMSD durations. 
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3.3 Low and High Frequency DFTs of Shot-Noise Processes 

Let X be either the renewal-reward process defined in (7) or the error duration 
process (9). [HHS04], Theorem 4.1, have proved that Theorem 3 still holds, 
i.e. jj 1 / 2 - 1 / 11 ^. converges in distribution to an a-stable law, where a is the 
tail index of the duration. This result can probably be extended to all the 
shot-noise process for which convergence in distribution of the partial sum 
process can be proved. 

The DFTs of these processes have an interesting feature, related to the 
slow growth/fast growth phenomenon. The high frequency DFTs, i. e. the 
DFT J^j computed at a frequency utj whose index j increases as n p for some 
p > 1 — 1/a, renormalized by the square root of the spectral density computed 
at iv j, have a Gaussian weak limit. This is proved in Theorem 4.2 of [HHS04]. 

3.4 Estimation of the memory parameter of the LMSV and LMSD 
models 

We now discuss parametric and semiparametric estimation of the memory 
parameter for the LMSV/LMSD models. Note that in both the LMSV and 
LMSD models, logo; 2 can be expressed as the sum of a long memory signal 
and iid noise. Specifically, we have 

logV 2 = n + ht + ut, (18) 

where fj, = E (loguf) and Ut = logf 2 — E (log?; 2 ) is a zero-mean iid series 
independent of {ht} ■ Since all the extant methodology for estimation for the 
LMSV model exploits only the above signal plus noise representation, the 
methodology continues to hold for the LMSD model. 

Assuming that {h t } is Gaussian, [DH01] derived asymptotic theory for the 
log-periodogram regression estimator (GPH; [GPH83]) of d based on {log V 2 }. 
This provides some justification for the use of GPH for estimating long mem- 
ory in volatility. Nevertheless, it can also be seen from Theorem 1 of [DH01] 
that the presence of the noise term {u t } induces a negative bias in the GPH 
estimator, which in turn limits the number m of Fourier frequencies which 
can be used in the estimator while still guaranteeing -^/m-consistency and 
asymptotic normality. This upper bound, to = o[n id l^ d+ % where n is the 
sample size, becomes increasingly stringent as d approaches zero. The results 
in [DH01] assume that d > and hence rule out valid tests for the presence of 
long memory in {h t }- Such a test based on the GPH estimator was provided 
and justified theoretically by [HS02]. 

[SP03] proposed a nonlinear log-periodogram regression estimator c?nlp 
of d, using Fourier frequencies 1, . . . ,to. They partially account for the noise 
term {it t } through a first-order Taylor expansion about zero of the spectral 
density of the observations, {log X?}. They establish the asymptotic normality 
of to 1 / 2 (c?nlp — d) under assumptions including n _4d m 4d+1 / 2 _> Const. Thus, 
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c^nlPj with a variance of order n " 4d /( 4d + 1 / 2 ) j converges faster than the GPH 
estimator, but still arbitrarily slowly if d is sufficiently close to zero. [SP03] 
also assumed that the noise and signal are Gaussian. This rules out most 
LMSV/LMSD models, since {logu 2 } is typically non-Gaussian. 

For the LMSV/LMSD model, results analogous to those of [DH01] were 
obtained by [Art04] for the GSE estimator, based once again on {log A 2 }. The 
use of GSE instead of GPH allows the assumption that {ht} is Gaussian to be 
weakened to linearity in a Martingale difference sequence. [Art04] requires the 
same restriction on m as in [DH01]. A test for the presence of long memory 
in {ht} based on the GSE estimator was provided by [HMS05]. 

[HR03] proposed a local Whittle estimator of d, based on log squared re- 
turns in the LMSV model. The local Whittle estimator, which may be viewed 
as a generalized version of the GSE estimator, includes an additional term 
in the Whittle criterion function to account for the contribution of the noise 
term {u t } to the low frequency behavior of the spectral density of {logX 2 }. 
The estimator is obtained from numerical optimization of the criterion func- 
tion. It was found in the simulation study of [HR03] that the local Whittle 
estimator can strongly outperform GPH, especially in terms of bias when m 
is large. 

Asymptotic properties of the local Whittle estimator were obtained by 
[HMS05], who allowed {ht} to be a long-memory process, linear in a Martin- 
gale difference sequence, with potential nonzero correlation with {u t }- Under 
suitable regularity conditions on the spectral density of {ht}, [HMS05] es- 
tablished the v^-consistency anc ] asymptotic normality of the local Whittle 
estimator, under certain conditions on m. If we assume that the short memory 
component of the spectral density of {ht} is sufficiently smooth, then their 
condition on m reduces to 

lim (m- Ad - l+s n Ad + n- 4 m 5 log 2 (m)) = (19) 

n — >oo 

for some arbitrarily small 5 > 0. 

The first term in (19) imposes a lower bound on the allowable value of m, 
requiring that m tend to oo faster than n 4d / ( 4d+1 ) . It is interesting that [DH01] , 
under similar smoothness assumptions, found that for m 1 ^' 1 (dGPH — d) to be 
asymptotically normal with mean zero, where dcPH is the GPH estimator, 
the bandwidth m must tend to oo at a rate slower than n 4d /( 4d + 1 ). Thus for 
any given d, the optimal rate of convergence for the local Whittle estimator 
is faster than that for the GPH estimator. 

Fully parametric estimation in LMSV/LMSD models once again is based 
on {logX 2 } and exploits the signal plus noise representation (18). When {h t } 
and {u t } are independent, the spectral density of {log A 2 } is simply the sum 
of the spectral densities of {h t } and {«*}, viz. 

f ioe xz(\)=M\)+<T 2 u /(2ir), (20) 

where f\ os x 2 is the spectral density of {logA 2 }, fh is the spectral density of 
{h t } and <r 2 = var(u t ), all determined by the assumed parametric model. This 
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representation suggests the possibility of estimating the model parameters in 
the frequency domain using the Whittle likelihood. Indeed, [Hos97] claims 
that the resulting estimator is y^-consistent and asymptotically normal. We 
believe that though the result provided in [Hos97] is correct, the proof is 
flawed. [Deo95] has shown that the quasi-maximum likelihood estimator ob- 
tained by maximizing the Gaussian likelihood of {log A t 2 } in the time domain 
is y^-consistent and asymptotically normal. 

One drawback of the latent- variable LMSV/LMSD models is that it is dif- 
ficult to derive the optimal predictor of \X t \ s . In the LMSV model, {|X t | s } 
for s > serves as a proxy for volatility, while in the LMSD model, {X t } 
represents durations. A computationally efficient algorithm for optimal linear 
prediction of such series was proposed in [DHL05], exploiting the Precondi- 
tioned Conjugate Gradient (PCG) algorithm. In [CHL05], it is shown that the 
computational cost of this algorithm is C>(nlog 5 ^ 2 n), in contrast to the much 
more expensive Levinson algorithm, which has cost of 0(n 2 ). 

3.5 Simulations on the GPH Estimator for Counts 

We simulated i.i.d. durations from a positive stable distribution with tail index 
a = 1.5, with an implied d for the counts of .25. We also simulated durations 
from an LMSD 0) model with Weibull innovations, AR(1) parameter 
of —.42, and d = .3545, as was estimated from actual tick-by-tick durations 
in [DHH05]. The stable durations were multiplied by a constant c = 1.21 so 
that the mean duration matches that found in actual data. For the LMSD 
durations, we used c = 1 . One unit in the rescaled durations is taken to repre- 
sent one second. Tables 1 and 2, for the stable and LMSD cases respectively, 
present the GPH estimates based on the resulting counts for different values 
of At, using n — 10,000, m = n 5 and m = n 8 . For the stable case, the 
bias was far more strongly negative for the smaller value of m, whereas for 
the LMSD case, the bias did not change dramatically with m. This is consis- 
tent with the discussion in Section 3.2, and also with the averaged log — log 
periodogram plots presented in Figure 1, where the averaging is taken over a 
large number of replications, and all positive Fourier frequencies are consid- 
ered, j = 1, . . . ,n/2. The plot for the stable durations (upper panel) shows 
a flat slope at the low frequencies. For this process, using more frequencies 
in the regression seems to mitigate the negative bias induced by the flatness 
in the lower frequencies as indicated by the less biased estimates of d when 
m = n - 8 . 

For the LMSD process, if the conjecture is correct then the counts should 
have the same memory parameter as the durations, d — .3545. Assuming that 
this is the case, we did not find severe negative bias in the GPH estimators 
on the counts, though the estimate of d seems to increase with At in the case 
when m = n 5 . The averaged log — log periodogram plot presented in the 
lower panel of Figure 1 shows a near-perfect straight line across all frequencies, 
which is quite different from the pattern we observed in the case of counts 
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based on stable durations. The straight-line relationship here is consistent 
with the bias results in our LMSD simulations, and with the discussion in 
Section 3.2. 

Statistical properties of iIgph and the choice of m for Gaussian long- 
memory time series have been discussed in recent literature. [Rob95b] showed 
for Gaussian processes that the GPH estimator is m^-consistent and asymp- 
totically normal if an increasing number of low frequencies L is trimmed from 
the regression of the log periodogram on log frequency. [HDB98] showed that 
trimming can be avoided for Gaussian processes. In our simulations, we did 
not use any trimming. There is as yet no theoretical justification for the GPH 
estimator in the current context since the counts are clearly non-Gaussian, 
and presumably constitute a nonlinear process. It is not clear whether trim- 
ming would be required for such a theory, but our simulations and theoretical 
results suggest that in some situations trimming may be helpful, while in 
others it may not be needed. 



Table 1. GPH estimators for counts with different At. Counts generated from 
iid stable durations with skewness parameter (3 = 0.8 and tail index a = 1.5. 
The corresponding memory parameter for counts is d = .25. We generated 500 
replications each with sample size n — 10, 000. The number of frequencies in the log 
periodogram regression was m = n 8 = 1585 and m = ypn = 100. lvalues marked 
with * reject the null hypothesis, d = 0.25 in favor of d < 0.25. 



At 


m = 


m = n l, ' s 


c = 1.21 


Mean(dGPH) 


t- Value 


Mean(dGPH) 


t- Value 


5 min 
10 min 
20 min 


0.1059 
0.0744 
0.0715 


-17.65* 
-23.08* 
-23.23* 


0.2328 
0.2212 
0.2186 


-5.77* 
-8.31* 
-7.75* 



Table 2. Mean of the GPH estimators for counts with different At. Counts generated 
from LMSD durations with Weibull (1, 7) shocks. The number of frequencies in the 
log periodogram regression was m = y/n and m — n ' 8 . We used d = .3545 and 
7 = 1.3376 for our simulations. We simulated 200 replications of the counts, each 
with sample size n = 10, 000. t-values marked with * reject the null hypothesis, 
d = 0.3545 in favor of d < 0.3545. 



At 


m = n ub 


m = n u ' 8 


c= 1 


Mean(dGPH) 


t- Value 


Mean(dapH) 


t- Value 


5 min 
30 min 
60 min 


0.3458 
0.3873 
0.3923 


-1.76* 
3.45* 
4.05* 


0.3471 
0.3469 
0.3478 


-6.49* 
-3.59* 
-3.20* 
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Fig. 1. Averaged log — log periodogram plots for the counts generated from iid 
Stable and LMSD durations. 



Log Periodogram vs. Log Frequency, iid Stable durations 
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3.6 Estimation of the memory parameter of the Infinite Source 
Poisson process 

Due to the underlying Poisson point process, the Infinite Poisson Source pro- 
cess is a very mathematically tractable model. Computations are very easy 
and in particular, convenient formulas for cumulants of integrals along paths 
of the process are available. This allows to derive the theoretical properties of 
estimators of the Hurst index or memory parameter. [FRS05] have defined an 
estimator of the Hurst index of the Infinite Poisson source process (with ran- 
dom transmission rate) related to the GSE and proved its consistency and rate 
of convergence. Instead of using the DFTs of the process, so-called wavelets 
coefficients are defined as follows. Let ip be a measurable compactly supported 
function on R such that / vp(s) ds = 0. For j e N and k = 0, . . . , V ' — 1, define 




If (13) holds, then E[wj, k ] = and var(w jtk ) = L{2^)2 {2 - a ^ = L(2^')2 2 «, 
where a is the tail index of the durations, d = 1— a/2 is the memory parameter 
and L is a slowly varying function at infinity. This scaling property makes it 
natural to define a contrast function 

w(d') = log (E Wlfc)6 4 2- 2d 's- *) + Sd '^m , 

where A is the admissible set of coefficients, which depends on the interval 
of observation and the support of the function ip. The estimator of d is then 
d = argmin d / e ( 0il / 2 ) W(d'). [FRS05] have proved under some additional tech- 
nical assumptions that this estimator is consistent. The rate of convergence 
can be obtained, but the asymptotic distribution is not known, though it is 
conjectured to be Gaussian, if the set A is properly chosen. 

Note in passing that here again, the slow growth/fast growth phenomenon 
arises. It can be shown, if the shocks and durations are independent, that for 
fixed k, 2^~ a ^/ 2 Wj t k converges to an a-stable distribution, but if k tends to 
infinity at a suitable rate, 2~ d ^vjj k converges to a complex Gaussian distri- 
bution. This slow growth/fast growth phenomenon is certainly a very deep 
property of these processes that should be understood more deeply. 
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Appendix 

Proof ( of Theorem 3). For simplicity, we set the clock-time spacing At = 1. 
Define 

S r ,n(0) = J2 Tk < ^ < 1 > 

fe=l 
[n0] 

S A N,n(0) = J2 AN *' < < 1 . 

t' = l 

Since a < 2 and {t^} is an i.i.d. sequence, by the fonctional central limit 
theorem (FCLT) for random variables in the domain of attraction of a stable 
law (see [EKM97, Theorem 2.4.10]), /(n)n- 1 /"{5 T ,„(0) - |n0JM converges 
weakly in D(0, 1) to an a-stable motion, for some slowly varying function I. 
Now define 

U n {6) = (2^)- 1 / 2 l{n)n- 1 l a {S AN ,M - L«*J/M • 

By the equivalence of FCLTs for the counting process and its associated partial 
sums of duration process (see [IW71]), U n also converges weakly in T>([0, 1]) 
to an a-stablc motion, say S. Summation by parts yields, for any nonzero 
Fourier frequency u)j (with fixed j > 0) 

n 

lin)^' 2 - 1 '*.]^ = (27T)- 1 / 2 l(n)n- 1 / a ^{AN^ - l/M^ 

t'=i 

n .1 

= J2{U n (t'/n) - U n ((t' - l)/n)}^ = / e 2i ^ dU n (x) . 
t'=i Jo 

Hence by the continuous mapping theorem 
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/(n)n 1 / 2 - 1 /" J,ff -±> / c 2i ^ dS(x) 
Jo 

which is a stochastic integral with respect to a stable motion, hence has a 
stable law. 

To prove the second statement of the theorem, note that for fixed j and 
as n — > oo, f(uj) ~ h(n)cjj 2d for some slowly varying function l\, so 

l(„\„l/a-l/2 j AN 

7 Z 1 / 2 ^) i(n)n 1/a_1/2 

jAN 

-di^n 1 /"^ 2 - 3 / 2 — ^ . (21) 

Since l/a + a/2-3/2 < 0, we have /(n)n 1 /«+«/ 2 -3/2 _^ . Hence by Slutsky's 
Theorem, (21) converges to zero. □ 

Proof (of Theorem 4). Let S*„(t) = rC H Y}k=i( T k ~ E M), * € (0,1). It is 

shown in Surgailis and Viano (2002) that S n (t) 4 B H (t) in X>([0, 1]) where 
Bn(t) is fractional Brownian motion with Hurst parameter H = d + 1/2. 
Thus, by Iglehart and Whitt (1971), it follows that t~ H N -> AB# in D([0, 1]), 
where A is a nonzero constant. The result follows as above by the continuous 
mapping theorem and summation by parts. □ 



